Using Multi-speaker TTS for ASR Adaptation (Using Multi-speaker TTS for ASR Adaptation)
Adapting Automatic Speech Recognition (ASR) systems to diverse speakers, accents, and environments is a persistent challenge, particularly when data from underrepresented speaker groups is scarce. Multi-speaker Text-to-Speech (TTS) technology offers a promising solution by synthesizing diverse and controllable speech samples to augment training data and improve ASR performance.
The goal of this project is to explore how multi-speaker TTS models can be effectively utilized for ASR adaptation. The student will:
* Investigate state-of-the-art TTS models capable of generating speech with varied speaker characteristics (e.g., Tacotron, FastSpeech, VITS, XTTS).
*Generate synthetic speech datasets mimicking specific speaker groups, accents, or environments.
*Evaluate the impact of these datasets on ASR system performance, focusing on recognition accuracy and robustness to variability.
Key tasks include:
* Experimenting with TTS models for speaker-specific and multi-speaker dataset generation.
* Integrating synthetic datasets into the ASR training pipeline.
* Evaluating the adapted ASR system on standard benchmarks and custom test cases.
Recommended toolkits: NVIDIA-NeMo, ESPnet, Kaldi, Hugging Face TTS/ASR libraries. Python proficiency and familiarity with deep learning frameworks (PyTorch or TensorFlow) are advantageous.
This project is suitable for students interested in speech synthesis, ASR, and the interplay between these fields. It offers opportunities to contribute to cutting-edge research, with potential extensions toward a thesis, conference publication, or advanced applications in multilingual and low-resource ASR systems.
Kulcsszavak: Multi-speaker TTS, ASR, domain adaptation, data augmentation, deep learning, transformer
Budapesti Műszaki és Gazdaságtudományi Egyetem (BME) Távközlési és Mesterséges Intelligencia Tanszék (TMIT) 1117, Budapest, Magyar tudósok körútja 2. tel: (1) 463-2448; fax: (1) 463-3107 email: titkarsag@tmit.bme.hu